2,095 research outputs found

    Nouveaux algorithmes et optimisations pour le développement de modèles interactives

    Get PDF
    In this thesis, we propose a human-in-the-loop framework for efficient model development over large data sets. In this framework, we aim to apply active learning algorithms to select a small sequence of data instances for the user to label and derive an accurate model while, at the same time, offering interactive performance in presenting the next data instance for reviewing. However, existing active learning techniques often fail to provide satisfactory performance when built over large data sets. Not only such models often require hundreds of labeled data instances to reach high accuracy, but retrieving the next instance to label can be time-consuming, making it incompatible with the interactive nature of the human exploration process. To address these issues, we propose the following contributions:1) A new theoretical framework that allows for an efficient implementation of the Generalized Binary Search strategy over kernel classifiers. Compared to previous work, our framework offers both strong theoretical guarantees on performance and efficient implementation in time and space.2) An optimized VS algorithm called OptVS that uses the hit-and-run algorithm for sampling the version space. We also develop a range of sampling optimizations to improve both sample quality and running time. In practice, we observe that OptVS achieves similar or better performance than state-of-the-art version space algorithms while running under 1 second per iteration at all times.3) A new algorithm that leverages the factorization structure provided by the user to create subspaces and factorizes the version space accordingly to perform active learning in the subspaces. We also provide theoretical results on the optimality of our factorized VS algorithm and optimizations for dealing with categorical variables. Our evaluation results show that, for all user patterns considered, our factorized VS algorithm outperforms non-factorized active learners as well as DSM, another factorization-aware algorithm, often by a wide margin while maintaining interactive speed.4) Following the intuitive reasoning behind the user decision-making process, we develop a new human-inspired classification algorithm, called the Factorized Linear Model (FLM), that decomposes the user interest as a combination of low-dimensional convex objects, resulting in an accurate, efficient, and interpretable classifier. In practice, we observe that the FLM classifier achieves comparable or better performance than SVM and another interpretable model, VIPR, over the majority of user interest patterns while taking only a few minutes to train over a large data set of nearly one million points.5) A novel automatically factorized active learning strategy called the Swapping Algorithm. This technique initially employs OptVS to escape the slow convergence of initial iterations and then swaps to an FLM-based strategy to take advantage of its higher classification accuracy. Our evaluation shows that the Swapping Algorithm achieves similar or better performance than non-factorized active learners while approximating the explicitly factorized methods.Dans cette thèse, nous proposons un cadre "human-in-the-loop" pour le développement efficace de modèles sur de grands ensembles de données. Dans ce cadre, nous visons à appliquer des algorithmes d'apprentissage actif pour sélectionner une petite séquence d'instances de données que l'utilisateur doit étiqueter et dériver un modèle précis, tout en offrant une performance interactive en présentant l'instance de données suivante pour examen. Cependant, les techniques d'apprentissage actif existantes ne parviennent souvent pas à fournir des performances satisfaisantes lorsqu'elles sont construites sur de grands ensembles de données. Non seulement ces modèles nécessitent souvent des centaines d'instances de données étiquetées pour atteindre un niveau de précision élevé, mais la récupération de l'instance suivante à étiqueter peut prendre beaucoup de temps, ce qui la rend incompatible avec la nature interactive du processus d'exploration humain. Pour résoudre ces problèmes, nous proposons les contributions suivantes :1) Un nouveau cadre théorique qui permet une mise en œuvre efficace de la stratégie de recherche binaire généralisée sur les classificateurs à noyau. Par rapport aux travaux précédents, notre cadre offre à la fois de solides garanties théoriques sur les performances et une mise en œuvre efficace en temps et en espace.2) Un algorithme VS optimisé appelé OptVS qui utilise l'algorithme hit-and-run pour échantillonner l'espace des versions. Nous développons également une série d'optimisations de l'échantillonnage pour améliorer à la fois la qualité de l'échantillon et le temps d'exécution. Dans la pratique, nous observons qu'OptVS atteint des performances similaires ou supérieures à celles des algorithmes d'espace de version de l'état de l'art tout en fonctionnant en permanence en dessous d'une seconde par itération.3) Un nouvel algorithme qui exploite la structure de factorisation fournie par l'utilisateur pour créer des sous-espaces et factoriser l'espace de version en conséquence pour effectuer un apprentissage actif dans les sous-espaces. Nous fournissons également des résultats théoriques sur l'optimalité de notre algorithme VS factorisé et des optimisations pour traiter les variables catégorielles. Nos résultats d'évaluation montrent que, pour tous les modèles d'utilisateurs considérés, notre algorithme VS factorisé surpasse les apprenants actifs non factorisés ainsi que DSM, un autre algorithme prenant en compte la factorisation, souvent par une large marge tout en maintenant la vitesse interactive.4) En suivant le raisonnement intuitif derrière le processus de prise de décision de l'utilisateur, nous développons un nouvel algorithme de classification inspiré par l'homme, appelé le modèle linéaire factorisé (FLM), qui décompose l'intérêt de l'utilisateur comme une combinaison d'objets convexes de faible dimension, résultant en un classificateur précis, efficace et interprétable. En pratique, nous observons que le classificateur FLM atteint des performances comparables ou supérieures à celles du SVM et d'un autre modèle interprétable, VIPR, pour la majorité des modèles d'intérêt de l'utilisateur, tout en prenant seulement quelques minutes pour s'entraîner sur un grand ensemble de données de près d'un million de points.5) Une nouvelle stratégie d'apprentissage actif factorisé automatiquement appelée l'algorithme de permutation. Cette technique utilise initialement OptVS pour échapper à la convergence lente des itérations initiales, puis passe à une stratégie basée sur FLM pour profiter de sa précision de classification supérieure. Notre évaluation montre que l'algorithme de permutation atteint des performances similaires ou supérieures à celles des apprenants actifs non factorisés tout en s'approchant des méthodes explicitement factorisées

    A Factorized Version Space Algorithm for "Human-In-the-Loop" Data Exploration

    Get PDF
    International audienceWhile active learning (AL) has been recently applied to help the user explore a large database to retrieve data instances of interest, existing methods often require a large number of instances to be labeled in order to achieve good accuracy. To address this slow convergence problem, our work augments version space-based AL algorithms, which have strong theoretical results on convergence but are very costly to run, with additional insights obtained in the user labeling process. These insights lead to a novel algorithm that factorizes the version space to perform active learning in a set of subspaces. Our work offers theoretical results on optimality and approximation for this algorithm, as well as optimizations for better performance. Evaluation results show that our factorized version space algorithm significantly outperforms other version space algorithms, as well as a recent factorization-aware algorithm, for large database exploration

    Analysis of the autoimmune regulator gene in patients with autoimmune non-APECED polyendocrinopathies

    Get PDF
    AbstractThe pathogenesis of autoimmunity was derived from a complex interaction of genetic and environmental factors. Autoimmune polyendocrinopathy-candidiasis-ectodermal dystrophy is a rare autosomal recessive disease caused by mutations in the autoimmune regulator (AIRE) gene. AIRE gene variants and, in particular, heterozygous loss-of-function mutations were also discovered in organ-specific autoimmune disorders, possibly contributing to their etiopathogenesis. It was suggested that even predisposition to develop certain autoimmune conditions may be derived from AIRE gene polymorphisms including S278R and intronic IVS9+6 G>A. In this study we unravel the hypothesis on whether AIRE gene variants may predispose individuals to associated autoimmune conditions in 41 Italian patients affected by non-APECED autoimmune polyendocrinopathies. We could not detect any heterozygous mutations of the AIRE gene. Although a trend of association was observed, heterozygous polymorphisms S278R and IVS9+6 G>A were detected in patients without statistically significant prevalence than in controls. Their putative contribution to autoimmune polyendocrinopathies and their predictive value in clinical strategies of disease development could be unravelled by analysing a larger sample of diseased patients and healthy individuals

    Nitro-oleic acid triggers ROS production via NADPH oxidase activation in plants: A pharmacological approach

    Get PDF
    Nitrated fatty acids (NO2-FAs) are important signaling molecules in mammals. NO2-FAs are formed by the addition reaction of nitric oxide- and nitrite-derived nitrogen dioxide with unsaturated fatty acid double bonds. The study of NO2-FAs in plant systems constitutes an interesting and emerging area. The presence of NO2-FA has been reported in olives, peas, rice and Arabidopsis. To gain a better understanding of the role of NO2-FA on plant physiology, we analyzed the effects of exogenous application of nitro-oleic acid (NO2-OA). In tomato cell suspensions we found that NO2-OA induced reactive oxygen species (ROS) production in a dose-dependent manner via activation of NADPH oxidases, a mechanism that requires calcium entry from the extracellular compartment and protein kinase activation. In tomato and Arabidopsis leaves, NO2-OA treatments induced two waves of ROS production, resembling plant defense responses. Arabidopsis NADPH oxidase mutants showed that NADPH isoform D (RBOHD) was required for NO2-OA-induced ROS production. In addition, on Arabidopsis isolated epidermis, NO2-OA induced stomatal closure via RBOHD and F. Altogether, these results indicate that NO2-OA triggers NADPH oxidase activation revealing a new signaling role in plants.Fil: Arruebarrena Di Palma, Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Biológicas. Universidad Nacional de Mar del Plata. Facultad de Ciencias Exactas y Naturales. Instituto de Investigaciones Biológicas; ArgentinaFil: Di Fino, Luciano Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Biológicas. Universidad Nacional de Mar del Plata. Facultad de Ciencias Exactas y Naturales. Instituto de Investigaciones Biológicas; ArgentinaFil: Salvatore, Sonia Rosana. University of Pittsburgh; Estados UnidosFil: D'Ambrosio, Juan Martín. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Biológicas. Universidad Nacional de Mar del Plata. Facultad de Ciencias Exactas y Naturales. Instituto de Investigaciones Biológicas; ArgentinaFil: Garcia-Mata, Carlos. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Biológicas. Universidad Nacional de Mar del Plata. Facultad de Ciencias Exactas y Naturales. Instituto de Investigaciones Biológicas; ArgentinaFil: Schopfer, Francisco Jose. University of Pittsburgh; Estados UnidosFil: Laxalt, Ana Maria. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Biológicas. Universidad Nacional de Mar del Plata. Facultad de Ciencias Exactas y Naturales. Instituto de Investigaciones Biológicas; Argentin

    Nitro-Oleic Acid Induced Reactive Oxygen Species Formation and Plant Defense Signaling in Tomato Cell Suspensions

    Get PDF
    Nitrated fatty acids (NO2-FAs) are formed by the addition reaction of nitric oxide- and nitrite-derived nitrogen dioxide with unsaturated fatty acids. Nitrated fatty acids act as signaling molecules in mammals through the formation of covalent adducts with cellular thiols. The study of NO2-FAs in plant systems constitutes an interesting and emerging area. The presence of NO2-FA has been reported in olives, peas, rice and in Arabidopsis. To gain a better understanding of the role of NO2-FA on plant physiology, we analyzed the effects of exogenous application of nitro-oleic acid (NO2-OA) to tomato cell cultures. We found that NO2-OA induced reactive oxygen species (ROS) production in a dose-dependent manner via activation of NADPH oxidases, which requires calcium entry from the extracellular compartment and protein kinase activation, a mechanism that resembles the plant defense responses. NO2-OA-induced ROS production, expression of plant defense genes and led to cell death. The mechanism of action of NO2-OA involves a reduction in the glutathione cellular pool and covalently addition reactions with protein thiols and reduced glutathione. Altogether, these results indicate that NO2-OA triggers responses associated with plant defense, revealing its possible role as a signal molecule in biotic stress.Fil: Arruebarrena Di Palma, Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Biológicas. Universidad Nacional de Mar del Plata. Facultad de Ciencias Exactas y Naturales. Instituto de Investigaciones Biológicas; ArgentinaFil: Di Fino, Luciano Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Biológicas. Universidad Nacional de Mar del Plata. Facultad de Ciencias Exactas y Naturales. Instituto de Investigaciones Biológicas; ArgentinaFil: Salvatore, Sonia Rosana. University of Pittsburgh; Estados UnidosFil: D'ambrosio, Juan Martín. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Biológicas. Universidad Nacional de Mar del Plata. Facultad de Ciencias Exactas y Naturales. Instituto de Investigaciones Biológicas; ArgentinaFil: Gergoff Grozeff, Gustavo Esteban. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Fisiología Vegetal. Universidad Nacional de La Plata. Facultad de Ciencias Naturales y Museo. Instituto de Fisiología Vegetal; ArgentinaFil: Garcia-Mata, Carlos. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Biológicas. Universidad Nacional de Mar del Plata. Facultad de Ciencias Exactas y Naturales. Instituto de Investigaciones Biológicas; ArgentinaFil: Schopfer, Francisco Jose. University of Pittsburgh; Estados UnidosFil: Laxalt, Ana Maria. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Mar del Plata. Instituto de Investigaciones Biológicas. Universidad Nacional de Mar del Plata. Facultad de Ciencias Exactas y Naturales. Instituto de Investigaciones Biológicas; Argentin

    Bounding the Time Delay between High-energy Neutrinos and Gravitational-wave Transients from Gamma-ray Bursts

    Full text link
    We derive a conservative coincidence time window for joint searches of gravita-tional-wave (GW) transients and high-energy neutrinos (HENs, with energies above 100GeV), emitted by gamma-ray bursts (GRBs). The last are among the most interesting astrophysical sources for coincident detections with current and near-future detectors. We take into account a broad range of emission mechanisms. We take the upper limit of GRB durations as the 95% quantile of the T90's of GRBs observed by BATSE, obtaining a GRB duration upper limit of ~150s. Using published results on high-energy (>100MeV) photon light curves for 8 GRBs detected by Fermi LAT, we verify that most high-energy photons are expected to be observed within the first ~150s of the GRB. Taking into account the breakout-time of the relativistic jet produced by the central engine, we allow GW and HEN emission to begin up to 100s before the onset of observable gamma photon production. Using published precursor time differences, we calculate a time upper bound for precursor activity, obtaining that 95% of precursors occur within ~250s prior to the onset of the GRB. Taking the above different processes into account, we arrive at a time window of tHEN - tGW ~ [-500s,+500s]. Considering the above processes, an upper bound can also be determined for the expected time window of GW and/or HEN signals coincident with a detected GRB, tGW - tGRB ~ tHEN - tGRB ~ [-350s,+150s]

    Intercalibration of the barrel electromagnetic calorimeter of the CMS experiment at start-up

    Get PDF
    Calibration of the relative response of the individual channels of the barrel electromagnetic calorimeter of the CMS detector was accomplished, before installation, with cosmic ray muons and test beams. One fourth of the calorimeter was exposed to a beam of high energy electrons and the relative calibration of the channels, the intercalibration, was found to be reproducible to a precision of about 0.3%. Additionally, data were collected with cosmic rays for the entire ECAL barrel during the commissioning phase. By comparing the intercalibration constants obtained with the electron beam data with those from the cosmic ray data, it is demonstrated that the latter provide an intercalibration precision of 1.5% over most of the barrel ECAL. The best intercalibration precision is expected to come from the analysis of events collected in situ during the LHC operation. Using data collected with both electrons and pion beams, several aspects of the intercalibration procedures based on electrons or neutral pions were investigated

    Prospective validation of the CLIP score: a new prognostic system for patient with cirrhosis and hepatocellular carcinoma

    Get PDF
    Prognosis of patients with cirrhosis and hepatocellular carcinoma (HCC) depends on both residual liver function and tumor extension. The CLIP score includes Child-Pugh stage, tumor morphology and extension, serum alfa-fetoprotein (AFP) levels, and portal vein thrombosis. We externally validated the CLIP score and compared its discriminatory ability and predictive power with that of the Okuda staging system in 196 patients with cirrhosis and HCC prospectively enrolled in a randomized trial. No significant associations were found between the CLIP score and the age, sex, and pattern of viral infection. There was a strong correlation between the CLIP score and the Okuda stage, As of June 1999, 150 patients (76.5%) had died. Median survival time was 11 months, overall, and it was 36, 22, 9, 7, and 3 months for CLIP categories 0, 1, 2, 3, and 4 to 6, respectively. In multivariate analysis, the CLIP score had additional explanatory power above that of the Okuda stage. This was true for both patients treated with locoregional therapy or not. A quantitative estimation of 2-year survival predictive power showed that the CLIP score explained 37% of survival variability, compared with 21% explained by Okuda stage. In conclusion, the CLIP score, compared with the Okuda staging system, gives more accurate prognostic information, is statistically more efficient, and has a greater survival predictive power. It could be useful in treatment planning by improving baseline prognostic evaluation of patients with RCC, and could be used in prospective therapeutic trials as a stratification variable, reducing the variability of results owing to patient selection

    Comparison and combination of a hemodynamics/biomarkers-based model with simplified PESI score for prognostic stratification of acute pulmonary embolism: findings from a real world study

    Get PDF
    Background: Prognostic stratification is of utmost importance for management of acute Pulmonary Embolism (PE) in clinical practice. Many prognostic models have been proposed, but which is the best prognosticator in real life remains unclear. The aim of our study was to compare and combine the predictive values of the hemodynamics/biomarkers based prognostic model proposed by European Society of Cardiology (ESC) in 2008 and simplified PESI score (sPESI).Methods: Data records of 452 patients discharged for acute PE from Internal Medicine wards of Tuscany (Italy) were analysed. The ESC model and sPESI were retrospectively calculated and compared by using Areas under Receiver Operating Characteristics (ROC) Curves (AUCs) and finally the combination of the two models was tested in hemodinamically stable patients. All cause and PE-related in-hospital mortality and fatal or major bleedings were the analyzed endpointsResults: All cause in-hospital mortality was 25% (16.6% PE related) in high risk, 8.7% (4.7%) in intermediate risk and 3.8% (1.2%) in low risk patients according to ESC model. All cause in-hospital mortality was 10.95% (5.75% PE related) in patients with sPESI score ≥1 and 0% (0%) in sPESI score 0. Predictive performance of sPESI was not significantly different compared with 2008 ESC model both for all cause (AUC sPESI 0.711, 95% CI: 0.661-0.758 versus ESC 0.619, 95% CI: 0.567-0.670, difference between AUCs 0.0916, p=0.084) and for PE-related mortality (AUC sPESI 0.764, 95% CI: 0.717-0.808 versus ESC 0.650, 95% CI: 0.598-0.700, difference between AUCs 0.114, p=0.11). Fatal or major bleedings occurred in 4.30% of high risk, 1.60% of intermediate risk and 2.50% of low risk patients according to 2008 ESC model, whereas these occurred in 1.80% of high risk and 1.45% of low risk patients according to sPESI, respectively. Predictive performance for fatal or major bleeding between two models was not significantly different (AUC sPESI 0.658, 95% CI: 0.606-0.707 versus ESC 0.512, 95% CI: 0.459-0.565, difference between AUCs 0.145, p=0.34). In hemodynamically stable patients, the combined endpoint in-hospital PE-related mortality and/or fatal or major bleeding (adverse events) occurred in 0% of patients with low risk ESC model and sPESI score 0, whilst it occurred in 5.5% of patients with low-risk ESC model but sPESI ≥1. In intermediate risk patients according to ESC model, adverse events occurred in 3.6% of patients with sPESI score 0 and 6.65% of patients with sPESI score ≥1.Conclusions: In real world, predictive performance of sPESI and the hemodynamic/biomarkers-based ESC model as prognosticator of in-hospital mortality and bleedings is similar. Combination of sPESI 0 with low risk ESC model may identify patients with very low risk of adverse events and candidate for early hospital discharge or home treatment.
    • …
    corecore